206 research outputs found

    Validation of differential gene expression algorithms: Application comparing fold-change estimation to hypothesis testing

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Sustained research on the problem of determining which genes are differentially expressed on the basis of microarray data has yielded a plethora of statistical algorithms, each justified by theory, simulation, or ad hoc validation and yet differing in practical results from equally justified algorithms. Recently, a concordance method that measures agreement among gene lists have been introduced to assess various aspects of differential gene expression detection. This method has the advantage of basing its assessment solely on the results of real data analyses, but as it requires examining gene lists of given sizes, it may be unstable.</p> <p>Results</p> <p>Two methodologies for assessing predictive error are described: a cross-validation method and a posterior predictive method. As a nonparametric method of estimating prediction error from observed expression levels, cross validation provides an empirical approach to assessing algorithms for detecting differential gene expression that is fully justified for large numbers of biological replicates. Because it leverages the knowledge that only a small portion of genes are differentially expressed, the posterior predictive method is expected to provide more reliable estimates of algorithm performance, allaying concerns about limited biological replication. In practice, the posterior predictive method can assess when its approximations are valid and when they are inaccurate. Under conditions in which its approximations are valid, it corroborates the results of cross validation. Both comparison methodologies are applicable to both single-channel and dual-channel microarrays. For the data sets considered, estimating prediction error by cross validation demonstrates that empirical Bayes methods based on hierarchical models tend to outperform algorithms based on selecting genes by their fold changes or by non-hierarchical model-selection criteria. (The latter two approaches have comparable performance.) The posterior predictive assessment corroborates these findings.</p> <p>Conclusions</p> <p>Algorithms for detecting differential gene expression may be compared by estimating each algorithm's error in predicting expression ratios, whether such ratios are defined across microarray channels or between two independent groups.</p> <p>According to two distinct estimators of prediction error, algorithms using hierarchical models outperform the other algorithms of the study. The fact that fold-change shrinkage performed as well as conventional model selection criteria calls for investigating algorithms that combine the strengths of significance testing and fold-change estimation.</p

    A frequentist framework of inductive reasoning

    Full text link
    Reacting against the limitation of statistics to decision procedures, R. A. Fisher proposed for inductive reasoning the use of the fiducial distribution, a parameter-space distribution of epistemological probability transferred directly from limiting relative frequencies rather than computed according to the Bayes update rule. The proposal is developed as follows using the confidence measure of a scalar parameter of interest. (With the restriction to one-dimensional parameter space, a confidence measure is essentially a fiducial probability distribution free of complications involving ancillary statistics.) A betting game establishes a sense in which confidence measures are the only reliable inferential probability distributions. The equality between the probabilities encoded in a confidence measure and the coverage rates of the corresponding confidence intervals ensures that the measure's rule for assigning confidence levels to hypotheses is uniquely minimax in the game. Although a confidence measure can be computed without any prior distribution, previous knowledge can be incorporated into confidence-based reasoning. To adjust a p-value or confidence interval for prior information, the confidence measure from the observed data can be combined with one or more independent confidence measures representing previous agent opinion. (The former confidence measure may correspond to a posterior distribution with frequentist matching of coverage probabilities.) The representation of subjective knowledge in terms of confidence measures rather than prior probability distributions preserves approximate frequentist validity.Comment: major revisio

    Copyrighr 2006 by me National Arc Educadon Association Scudies in An Educario n A

    Get PDF
    Nrltography is a form of practice-based research steeped in the arts and education. Alongside other arts-based, arts-informed and aesthetically defined methodologies, a/rltography is one of many emerging forms of inquiry that refer to the arts as a way of re-searching the world to enhance understanding. Yet, it goes even further by recognizing the educarive potential of teaching and learning as acts of inquiry. Together, the arts and education complement, resist, and echo one another through rhizomaric relarions of living inquiry. In this article, we demonstrate rhizomatic relations in an ongoing ptoject entitled &quot;The City of Richgate&quot; where meanings are constructed within ongoing a/rltographic inquiries described as collective artistic and educational praxis. Rhizomatic relations do not seek conclusions and therefore, neither will this account. Instead, we explore al rltographical situations as methodological spaces for furthering living inquiry. In doing so, we invite the art education communiry to consider rhizomatic relations performed through a/r/tography as a politically informed methodology of situations. Alrltography is an arts and education practice-based research methodology (Sullivan, 2004) press). The name itself exemplifies these features by setting art and graphy, and the identities of artist, researcher, and teacher (a/rlt), in contiguous relations. l None of these featmes is privileged over another as they occur simultaneously in and through time and space. Moreover, the acts of inquiry and the three identities resist modernist categorizations and instead exist as post-structural conceptualizations of practice (for example In this article, we wish to describe a/r/tographical inquiry as a methodology of situations and to do this, we share the journey of a collaborative project undertaken by a group of artists, educators, and Studies in Art Education The Rhizomatic Relations of Alrltography researchers working with a number of families in a nearby city. The , project is entitled &quot;The City of Richgate&quot; and examines issues related to immigration, place, and community within an artistically oriented inquiry. Although the project itself would be of interest to the field of art education, this article is dedicated to the elaboration of a/r/tography as a methodology of situarions. The project provides a way of elaborating upon alrltography as a methodology that provokes the creation of situations through inquiry, that responds to the evocative nature of situations found within data, and that provides a reRective and reflexive stance to situational inquiries. These situations are often found, created, or ruptured within the rhizomatic nature of a/r/tography. It is on this basis that the article is premised: rhizomatic relationality is essential to alrltography as a methodology of situations. Deleuze and Guattari (1987) describe rhizomes metaphorically through the image of crabgrass that &quot;connects any point to any other point&quot; (p. 21) by growing in all directions. Through this image they stress the importance of the &apos;middle&apos; by disrupting the linearity of beginnings and endings. After all, one fails to pursue a tangent if a particular line of thought is subscribed. Rhizomes resist taxonomies and create interconnected networks with multiple entry points (see Rhizomatic relationality affects howwe understand theoty and practice, product and process. Theory is no longer an abstract concept but rather an embodied living inquiry, an interstitial relational space for creating, teaching, learning, and researching in a constant state of becoming (see als

    How accurate and statistically robust are catalytic site predictions based on closeness centrality?

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>We examine the accuracy of enzyme catalytic residue predictions from a network representation of protein structure. In this model, amino acid α-carbons specify vertices within a graph and edges connect vertices that are proximal in structure. Closeness centrality, which has shown promise in previous investigations, is used to identify important positions within the network. Closeness centrality, a global measure of network centrality, is calculated as the reciprocal of the average distance between vertex <it>i </it>and all other vertices.</p> <p>Results</p> <p>We benchmark the approach against 283 structurally unique proteins within the Catalytic Site Atlas. Our results, which are inline with previous investigations of smaller datasets, indicate closeness centrality predictions are statistically significant. However, unlike previous approaches, we specifically focus on residues with the very best scores. Over the top five closeness centrality scores, we observe an average true to false positive rate ratio of 6.8 to 1. As demonstrated previously, adding a solvent accessibility filter significantly improves predictive power; the average ratio is increased to 15.3 to 1. We also demonstrate (for the first time) that filtering the predictions by residue identity improves the results even more than accessibility filtering. Here, we simply eliminate residues with physiochemical properties unlikely to be compatible with catalytic requirements from consideration. Residue identity filtering improves the average true to false positive rate ratio to 26.3 to 1. Combining the two filters together has little affect on the results. Calculated p-values for the three prediction schemes range from 2.7E-9 to less than 8.8E-134. Finally, the sensitivity of the predictions to structure choice and slight perturbations is examined.</p> <p>Conclusion</p> <p>Our results resolutely confirm that closeness centrality is a viable prediction scheme whose predictions are statistically significant. Simple filtering schemes substantially improve the method's predicted power. Moreover, no clear effect on performance is observed when comparing ligated and unligated structures. Similarly, the CC prediction results are robust to slight structural perturbations from molecular dynamics simulation.</p

    Laparoscopic fistula excision and omentoplasty for high rectovaginal fistulas: a prospective study of 40 patients

    Get PDF
    AIM: The aim of this study is to prospectively evaluate 40 patients with a high rectovaginal fistula treated by a laparoscopic fistula division and closure, followed by an omentoplasty. PATIENTS AND METHODS: Forty patients with a rectovaginal fistula, between the middle third of the rectum and the posterior vaginal fornix, resulting from different causes (IBD, iatrogenic and birth trauma) were treated by a laparoscopic excision of the fistula and insertion of an omentoplasty in the rectovaginal septum. The patients completed the gastrointestinal quality of life index questionnaire (GIQLI) and the Cleveland Clinic incontinence score (CCIS). All tests were performed at regular intervals after treatment. RESULTS: In 38 (95%) patients with a median age of 53 years (range 33-72), the surgical procedure was feasible. In two patients, the fistula was closed without an omentoplasty, and a diverting stoma was performed. The median follow-up was 28 months (range 10-35). Two patients (5%) developed a recurrent fistula. In one patient, the interposed omentum became necrotic and was successfully treated laparoscopically. In another patient, an abscess developed, which needed drainage procedures. The mean CCIS was 9 (range 7-10) before treatment and 10 (range 7-13) after treatment (p = 0.5 Wilcoxon). The median GIQLI score was 85 (range 34-129) before treatment and 120 (range75-142) after treatment (p = 0.0001, Wilcoxon). CONCLUSIONS: Laparoscopic fistula excision combined with omentoplasty is a good treatment modality with a high healing rate for high rectovaginal fistulas and an acceptable complication rate

    The Statistics of Bulk Segregant Analysis Using Next Generation Sequencing

    Get PDF
    We describe a statistical framework for QTL mapping using bulk segregant analysis (BSA) based on high throughput, short-read sequencing. Our proposed approach is based on a smoothed version of the standard statistic, and takes into account variation in allele frequency estimates due to sampling of segregants to form bulks as well as variation introduced during the sequencing of bulks. Using simulation, we explore the impact of key experimental variables such as bulk size and sequencing coverage on the ability to detect QTLs. Counterintuitively, we find that relatively large bulks maximize the power to detect QTLs even though this implies weaker selection and less extreme allele frequency differences. Our simulation studies suggest that with large bulks and sufficient sequencing depth, the methods we propose can be used to detect even weak effect QTLs and we demonstrate the utility of this framework by application to a BSA experiment in the budding yeast Saccharomyces cerevisiae

    Empirical Bayes analysis of single nucleotide polymorphisms

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>An important goal of whole-genome studies concerned with single nucleotide polymorphisms (SNPs) is the identification of SNPs associated with a covariate of interest such as the case-control status or the type of cancer. Since these studies often comprise the genotypes of hundreds of thousands of SNPs, methods are required that can cope with the corresponding multiple testing problem. For the analysis of gene expression data, approaches such as the empirical Bayes analysis of microarrays have been developed particularly for the detection of genes associated with the response. However, the empirical Bayes analysis of microarrays has only been suggested for binary responses when considering expression values, i.e. continuous predictors.</p> <p>Results</p> <p>In this paper, we propose a modification of this empirical Bayes analysis that can be used to analyze high-dimensional categorical SNP data. This approach along with a generalized version of the original empirical Bayes method are available in the R package siggenes version 1.10.0 and later that can be downloaded from <url>http://www.bioconductor.org</url>.</p> <p>Conclusion</p> <p>As applications to two subsets of the HapMap data show, the empirical Bayes analysis of microarrays cannot only be used to analyze continuous gene expression data, but also be applied to categorical SNP data, where the response is not restricted to be binary. In association studies in which typically several ten to a few hundred SNPs are considered, our approach can furthermore be employed to test interactions of SNPs. Moreover, the posterior probabilities resulting from the empirical Bayes analysis of (prespecified) interactions/genotypes can also be used to quantify the importance of these interactions.</p

    Combining Shapley value and statistics to the analysis of gene expression data in children exposed to air pollution

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In gene expression analysis, statistical tests for differential gene expression provide lists of candidate genes having, individually, a sufficiently low <it>p</it>-value. However, the interpretation of each single <it>p</it>-value within complex systems involving several interacting genes is problematic. In parallel, in the last sixty years, <it>game theory </it>has been applied to political and social problems to assess the power of interacting agents in forcing a decision and, more recently, to represent the relevance of genes in response to certain conditions.</p> <p>Results</p> <p>In this paper we introduce a Bootstrap procedure to test the null hypothesis that each gene has the same relevance between two conditions, where the relevance is represented by the Shapley value of a particular coalitional game defined on a microarray data-set. This method, which is called <it>Comparative Analysis of Shapley value </it>(shortly, CASh), is applied to data concerning the gene expression in children differentially exposed to air pollution. The results provided by CASh are compared with the results from a parametric statistical test for testing differential gene expression. Both lists of genes provided by CASh and t-test are informative enough to discriminate exposed subjects on the basis of their gene expression profiles. While many genes are selected in common by CASh and the parametric test, it turns out that the biological interpretation of the differences between these two selections is more interesting, suggesting a different interpretation of the main biological pathways in gene expression regulation for exposed individuals. A simulation study suggests that CASh offers more power than t-test for the detection of differential gene expression variability.</p> <p>Conclusion</p> <p>CASh is successfully applied to gene expression analysis of a data-set where the joint expression behavior of genes may be critical to characterize the expression response to air pollution. We demonstrate a synergistic effect between coalitional games and statistics that resulted in a selection of genes with a potential impact in the regulation of complex pathways.</p

    High-Throughput Sequencing to Reveal Genes Involved in Reproduction and Development in Bactrocera dorsalis (Diptera: Tephritidae)

    Get PDF
    BACKGROUND: Tephritid fruit flies in the genus Bactrocera are of major economic significance in agriculture causing considerable loss to the fruit and vegetable industry. Currently, there is no ideal control program. Molecular means is an effective method for pest control at present, but genomic or transcriptomic data for members of this genus remains limited. To facilitate molecular research into reproduction and development mechanisms, and finally effective control on these pests, an extensive transcriptome for the oriental fruit fly Bactrocera dorsalis was produced using the Roche 454-FLX platform. RESULTS: We obtained over 350 million bases of cDNA derived from the whole body of B. dorsalis at different developmental stages. In a single run, 747,206 sequencing reads with a mean read length of 382 bp were obtained. These reads were assembled into 28,782 contigs and 169,966 singletons. The mean contig size was 750 bp and many nearly full-length transcripts were assembled. Additionally, we identified a great number of genes that are involved in reproduction and development as well as genes that represent nearly all major conserved metazoan signal transduction pathways, such as insulin signal transduction. Furthermore, transcriptome changes during development were analyzed. A total of 2,977 differentially expressed genes (DEGs) were detected between larvae and pupae libraries, while there were 1,621 DEGs between adults and larvae, and 2,002 between adults and pupae. These DEGs were functionally annotated with KEGG pathway annotation and 9 genes were validated by qRT-PCR. CONCLUSION: Our data represent the extensive sequence resources available for B. dorsalis and provide for the first time access to the genetic architecture of reproduction and development as well as major signal transduction pathways in the Tephritid fruit fly pests, allowing us to elucidate the molecular mechanisms underlying courtship, ovipositing, development and detailed analyses of the signal transduction pathways
    • …
    corecore